Include offending series labels in missing-metric-name push errors#7657
Open
anxkhn wants to merge 1 commit into
Open
Include offending series labels in missing-metric-name push errors#7657anxkhn wants to merge 1 commit into
anxkhn wants to merge 1 commit into
Conversation
When a pushed series has no __name__ label, the error returned to the client only said "no metric name label" / "sample missing metric name" with no indication of which series caused it. Operators had to bisect by stopping Prometheus instances one by one to find the culprit. Enrich both user-facing push paths so the offending series labels are always included: - noMetricNameError now carries the series and renders it via formatLabelSet, matching every sibling validation error in the same file. This covers the validation (400) path used when metric name enforcement is enabled. - The fatal error from tokenForLabels (the shard_by_all_labels=false path that produced the original 500 with no context) is wrapped with the series labels. Fixes cortexproject#5802 Signed-off-by: Anas Khan <83116240+anxkhn@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does:
When a pushed series has no
__name__label, the error Cortex returns to theremote-write client gave no indication of which series was at fault, only
no metric name label(HTTP 500) orsample missing metric name(HTTP 400).As the issue reporter described, the only way to find the culprit was to bisect
by stopping Prometheus instances one at a time.
This enriches both user-facing push paths so the offending series labels are
always included in the error:
noMetricNameErrornow carries the series and renders it viaformatLabelSet,exactly like every sibling validation error in
pkg/util/validation/errors.go(
tooManyLabelsError,labelValueTooLongError, the native-histogram errors,etc.). This is the validation path (HTTP 400) taken when metric-name
enforcement is enabled (the default).
tokenForLabelsin the distributor (theshard_by_all_labels=falsepath that produced the original context-less HTTP500) is wrapped with the series labels via
cortexpb.FromLabelAdaptersToMetric(ts.Labels).String().Example, before vs after (validation path):
Which issue(s) this PR fixes:
Fixes #5802
Checklist
CHANGELOG.mdupdateddocs/configuration/v1-guarantees.mdupdated if this PR introduces experimental flags (N/A - no new flags)